11 research outputs found

    Efficient Decomposed Learning for Structured Prediction

    Full text link
    Structured prediction is the cornerstone of several machine learning applications. Unfortunately, in structured prediction settings with expressive inter-variable interactions, exact inference-based learning algorithms, e.g. Structural SVM, are often intractable. We present a new way, Decomposed Learning (DecL), which performs efficient learning by restricting the inference step to a limited part of the structured spaces. We provide characterizations based on the structure, target parameters, and gold labels, under which DecL is equivalent to exact learning. We then show that in real world settings, where our theoretical assumptions may not completely hold, DecL-based algorithms are significantly more efficient and as accurate as exact learning.Comment: ICML201

    Algorithms for structural learning with decompositions

    Get PDF
    Structured prediction describes problems which involve predicting multiple output variables with expressive and complex interdependencies and constraints. Learning over expressive structures (called structural learning) is usually time-consuming as exploring the structured space can be an intractable problem. The goal of this thesis is to present different techniques for structural learning, which learn by decomposing the problem space into simpler and tractable components. We will consider three different settings: fully supervised, unsupervised and semi-supervised, and discriminative latent variable setting, and present learning techniques for each case. For supervised structural learning, we describe a paradigm called Decomposed Learning (DecL) which decomposes the inference procedure during learning into small inference steps using additional application specific information. For unsupervised learning, we propose a family of Expectation Maximization [Dempster et al., 1977] algorithms called Unified Expectation Maximization (UEM) [Samdani et al., 2012a] that covers several seemingly divergent versions of EM e.g. hard EM. To efficiently add domain-specific declarative constraints into learning, we use a dual projected subgradient ascent algorithm which naturally decomposes the task into simpler components. In the discriminative latent variable scenario, we present a supervised latent variable model for clustering called the Latent Left-Linking Model (L3M) that can efficiently cluster items arriving in a streaming order. We decompose the learning process for L3M into small and efficient stochastic gradient descent steps that lead to rapid convergence

    Learning multi-linear representations of distributions for efficient inference

    No full text
    Abstract We examine the class of multi-linear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLR have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of distributions which have exponential size in other commonly used representations, while supporting probabilistic inference in time linear in the size of the representation. Our key contribution is presenting techniques for learning bounded-size distributions represented using MLR, which support efficient probabilistic inference. We demonstrate experimentally that the MLR representations we learn support accurate and very efficient inference. Keywords Learning probability distributions 路 Multi-linear polynomials 路 Probabilistic inference 路 Graphical model

    Unified expectation maximization

    No full text
    We present a general framework containing a graded spectrum of Expectation Maximization (EM) algorithms called Unified Expectation Maximization (UEM.) UEM is parameterized by a single parameter and covers existing algorithms like standard EM and hard EM, constrained versions of EM such as Constraint-Driven Learning (Chang et al., 2007) and Posterior Regularization (Ganchev et al., 2010), along with a range of new EM algorithms. For the constrained inference step in UEM we present an efficient dual projected gradient ascent algorithm which generalizes several dual decomposition and Lagrange relaxation algorithms popularized recently in the NLP literature (Ganchev et al., 2008; Koo et al., 2010; Rush and Collins, 2011). UEM is as efficient and easy to implement as standard EM. Furthermore, experiments on POS tagging, information extraction, and word-alignment show that often the best performing algorithm in the UEM family is a new algorithm that wasn鈥檛 available earlier, exhibiting the benefits of the UEM framework.
    corecore